48 research outputs found
A Cyberpunk 2077 perspective on the prediction and understanding of future technology
Science fiction and video games have long served as valuable tools for
envisioning and inspiring future technological advancements. This position
paper investigates the potential of Cyberpunk 2077, a popular science fiction
video game, to shed light on the future of technology, particularly in the
areas of artificial intelligence, edge computing, augmented humans, and
biotechnology. By analyzing the game's portrayal of these technologies and
their implications, we aim to understand the possibilities and challenges that
lie ahead. We discuss key themes such as neurolink and brain-computer
interfaces, multimodal recording systems, virtual and simulated reality,
digital representation of the physical world, augmented and AI-based home
appliances, smart clothing, and autonomous vehicles. The paper highlights the
importance of designing technologies that can coexist with existing preferences
and systems, considering the uneven adoption of new technologies. Through this
exploration, we emphasize the potential of science fiction and video games like
Cyberpunk 2077 as tools for guiding future technological advancements and
shaping public perception of emerging innovations.Comment: 12 pages, 7 figure
Journal of Real-Time Image Processing manuscript No. (will be inserted by the editor) Evaluation of real-time LBP computing in multiple architectures
Abstract Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications requiring, in many cases, real-time operation in multiple computing platforms. The irruption of new video standards has increased the typical resolutions and frame rates, which need considerable computational performance. Since LBP is essentially a pixel operator that scales with image size, typical straightforward implementations are usually insufficient to meet these requirements. To identify the solutions that maximize the performance of the real-time LBP extraction, we compare a series different implementations in terms of computational performance and energy efficiency while analyzing the different optimizations that can be made to reach real-time performance on multiple platforms and their different available computing resources. Our contribution addresses the extensive survey of LBP implementations in different platforms that can be found in the literature. To provide for a more complete evaluation, we have implemented the LBP algorithms in several platforms such as Graphics Processing Units, mobile processors and a hybrid programming model image coprocessor. We have extended the evaluation of some of the solutions that can be found in previous work. In addition, we publish the source code of our implementations
Kinship Verification from Videos using Spatio-Temporal Texture Features and Deep Learning
Automatic kinship verification using facial images is a relatively new and
challenging research problem in computer vision. It consists in automatically
predicting whether two persons have a biological kin relation by examining
their facial attributes. While most of the existing works extract shallow
handcrafted features from still face images, we approach this problem from
spatio-temporal point of view and explore the use of both shallow texture
features and deep features for characterizing faces. Promising results,
especially those of deep features, are obtained on the benchmark UvA-NEMO Smile
database. Our extensive experiments also show the superiority of using videos
over still images, hence pointing out the important role of facial dynamics in
kinship verification. Furthermore, the fusion of the two types of features
(i.e. shallow spatio-temporal texture features and deep features) shows
significant performance improvements compared to state-of-the-art methods.Comment: 7 page
Improving Depression estimation from facial videos with face alignment, training optimization and scheduling
Deep learning models have shown promising results in recognizing depressive
states using video-based facial expressions. While successful models typically
leverage using 3D-CNNs or video distillation techniques, the different use of
pretraining, data augmentation, preprocessing, and optimization techniques
across experiments makes it difficult to make fair architectural comparisons.
We propose instead to enhance two simple models based on ResNet-50 that use
only static spatial information by using two specific face alignment methods
and improved data augmentation, optimization, and scheduling techniques. Our
extensive experiments on benchmark datasets obtain similar results to
sophisticated spatio-temporal models for single streams, while the score-level
fusion of two different streams outperforms state-of-the-art methods. Our
findings suggest that specific modifications in the preprocessing and training
process result in noticeable differences in the performance of the models and
could hide the actual originally attributed to the use of different neural
network architectures.Comment: 5 page
Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support
In global healthcare, respiratory diseases are a leading cause of mortality,
underscoring the need for rapid and accurate diagnostics. To advance rapid
screening techniques via auscultation, our research focuses on employing one of
the largest publicly available medical database of respiratory sounds to train
multiple machine learning models able to classify different health conditions.
Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to
extract physiologically relevant biosignals from acoustic data, closely tied to
cardiovascular and respiratory patterns, making our approach apart in its
departure from conventional audio feature extraction practices. We use Power
Spectral Density analysis and filtering techniques to select Intrinsic Mode
Functions (IMFs) strongly correlated with underlying physiological phenomena.
These biosignals undergo a comprehensive feature extraction process for
predictive modeling. Initially, we deploy a binary classification model that
demonstrates a balanced accuracy of 87% in distinguishing between healthy and
diseased individuals. Subsequently, we employ a six-class classification model
that achieves a balanced accuracy of 72% in diagnosing specific respiratory
conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For
the first time, we also introduce regression models that estimate age and body
mass index (BMI) based solely on acoustic data, as well as a model for gender
classification. Our findings underscore the potential of this approach to
significantly enhance assistive and remote diagnostic capabilities.Comment: 5 pages, 2 figures, 3 tables, Conference pape
Natural course of septo-optic dysplasia: Retrospective analysis of 20 cases
Introducción. La displasia septoóptica (DSO) es la combinación variable de signos de disgenesia de línea media cerebral, hipoplasia de nervios ópticos y disfunción hipotálamo-hipofisaria, asociándose, a veces, con un espectro variado de malformaciones de la corteza cerebral. Objetivo. Describir la evolución natural y los hallazgos de neuroimagen en una serie de 20 pacientes diagnosticados. Pacientes y métodos. Se revisan de forma retrospectiva las características epidemiológicas, clínicas y neurroradiológicas de 20 pacientes consecutivos diagnosticados de DSO entre enero de 1985 y enero de 2010. Se analizaron los datos de tomografía computarizada, resonancia magnética craneal, electroencefalograma, potenciales evocados visuales, valoración oftalmológica, cariotipo y estudio endocrinológico. En siete pacientes, se realizó estudio del gen Homeobox HESX1. Resultados. El 60% de los casos presentaba antecedentes patológicos en el primer trimestre de gestación, con las ecografías fetales normales. Clínicamente, destacaban manifestaciones visuales (85%), alteraciones endocrinas (50%), retraso mental (60%) y crisis epilépticas (55%). Un 55% se asociaba a anomalías de migración neuronal. En un 45%, la DSO era el único hallazgo de neuroimagen. Se realizó cariotipo a todos, siendo normal. El gen HESX1 fue positivo en dos de los siete casos estudiados (ambos con DSO aislada). Ninguno con mutación en el gen HESX1 presentaba consanguinidad familiar. No se realizó estudio genético a los padres. Conclusiones. La DSO debe clasificarse como un síndrome malformativo heterogéneo, que asocia múltiples anomalías cerebrales, oculares, endocrinas y sistémicas. Las formas más graves se asocian con anomalías de la migración neuronal y de la organización cortical (AU)Introduction. Septo-optic dysplasia (SOD) is the variable combination of signs of dysgenesis of the midline of the brain, hypoplasia of the optic nerves and hypothalamus-pituitary dysfunction, which is sometimes associated with a varied spectrum of malformations of the cerebral cortex. Aims. To describe the natural history and neuroimaging findings in a series of 20 diagnosed patients. Patients and methods. We review the epidemiological, clinical and neuroimaging characteristics of 20 consecutive patients diagnosed with SOD between January 1985 and January 2010. Data obtained from computerised tomography, magnetic resonance imaging of the head, electroencephalogram, visual evoked potentials, ophthalmological evaluation, karyotyping and endocrinological studies were analysed. In seven patients, a study of the gene Homeobox HESX1 was conducted. Results. Pathological antecedents in the first three months of gestation were presented by 60% of the cases, with normal results in the foetal ultrasound scans. Clinically, the most striking features were visual manifestations (85%), endocrine disorders (50%), mental retardation (60%) and epileptic seizures (55%). Fifty-five per cent were associated to abnormal neuronal migration. In 45%, SOD was the only finding in the neuroimaging scans. Karyotyping was performed in all cases, the results being normal. Gene HESX1 was positive in two of the seven cases studied (both with isolated SOD). None of those with mutation in gene HESX1 presented familial consanguinity. No gene study was conducted with the parents. Conclusions. SOD must be classified as a heterogeneous malformation syndrome, which is associated to multiple brain, ocular, endocrine and systemic anomalies. The most severe forms are associated with abnormal neuronal migration and cortical organisation (AU
Introducing VTT-ConIot: A Realistic Dataset for Activity Recognition of Construction Workers Using IMU Devices
Sustainable work aims at improving working conditions to allow workers to effectively extend their working life. In this context, occupational safety and well-being are major concerns, especially in labor-intensive fields, such as construction-related work. Internet of Things and wearable sensors provide for unobtrusive technology that could enhance safety using human activity recognition techniques, and has the potential of improving work conditions and health. However, the research community lacks commonly used standard datasets that provide for realistic and variating activities from multiple users. In this article, our contributions are threefold. First, we present VTT-ConIoT, a new publicly available dataset for the evaluation of HAR from inertial sensors in professional construction settings. The dataset, which contains data from 13 users and 16 different activities, is collected from three different wearable sensor locations.Second, we provide a benchmark baseline for human activity recognition that shows a classification accuracy of up to 89% for a six class setup and up to 78% for a sixteen class more granular one. Finally, we show an analysis of the representativity and usefulness of the dataset by comparing it with data collected in a pilot study made in a real construction environment with real workers
Designing for energy-efficient vision-based interactivity on mobile devices
Abstract
Future multimodal mobile platforms are expected to require high interactivity in their applications and user interfaces. Until now, mobile devices have been designed to remain in a stand-by state until the user actively turns it on in the interaction sense. The motivation for this approach has been battery conservation.
Imaging is a versatile sensing modality that can enable context recognition, unobtrusively predicting the user's interaction needs and directing the computational resources accordingly. However, vision-based always-on functionalities have been impractical in battery-powered devices, since their requirements of computational power and energy make their use unattainable for extended periods of time.
Vision-based applications can benefit from the addition of interactive stages that, properly designed, can reduce the complexity of the methods utilizing user feedback and collaboration, resulting in a system that balances computational throughput and energy efficiency.
The usability of user interfaces critically rests on their latency. However, an always-on sensing platform needs a careful balance with the power consumption demands. Improving reactiveness when designing for highly interactive vision-based interfaces can be achieved by reducing the number of operations that the application processor needs to execute, deriving the most expensive tasks to accelerators or specific processors.
In this context, this thesis focuses on investigating and surveying enablers and solutions for vision-based interactivity on mobile devices. The thesis explores the development of new user interaction methods by analyzing and comparing means to reach interactivity, high performance, low latency and energy efficiency. The researched techniques, ranging from mobile GPGPU and dedicated sensor processing to reconfigurable image processors, provide understanding on designing for future mobile platforms.Tiivistelmä
Tulevaisuuden multimodaalisten mobiilialustojen sovellusten ja käyttöliittymien odotetaan vaativan käyttäjältä läheistä vuorovaikutusta. Tähän saakka mobiililaitteet on suunniteltu pysymään valveustilassa siihen asti kunnes käyttäjä aktivoi laitteen. Tällä lähestymistavalla on pyritty pidentämään akun kestoa.
Kuvantaminen on monipuolinen aistimodaliteetti, joka mahdollistaa kontekstin tunnistuksen ennakoimalla huomaamattomasti käyttäjän vuorovaikutustarpeet ja suuntaamalla laskennalliset resurssit asianmukaisesti. Näköpohjaiset, jatkuvasti päällä olevat toiminnot ovat kuitenkin epäkäytännöllisiä akkukäyttöisissä laitteissa sillä niiden laskennallisen suoritustehokkuuden ja akun keston vaatimukset tekevät pidemmästä yhtäjaksoisesta käytöstä mahdotonta.
Kamerapohjaiset sovellukset voivat hyötyä interaktiivisten vaiheiden lisäämisestä. Oikein suunniteltuina ne vähentävät käyttäjäpalautetta ja -yhteistyötä hyödyntävien menetelmien monimutkaisuutta, joka saattaa laskennallisen suoritustehokkuuden ja energiatehokkuuden tasapainoon.
Käyttöliittymien käytettävyys on kriittisesti riippuvainen niiden viiveestä. Jatkuvasti päällä oleva aistiva alusta edellyttää kuitenkin tasapainottelua virrankulutuksen vaatimusten kanssa. Hyvin interaktiivisia kamerapohjaisia käyttöliittymiä suunniteltaessa reaktiivisuuden parantaminen saadaan aikaan vähentämällä prosessorin käsittelemien operaatioiden määrää, johtamalla kuormittavimmat tehtävät kiihdyttimille tai erillisille prosessoreille.
Tässä kontekstissa, väitöskirjatutkimus keskittyy tutkimaan ja tarkastelemaan mahdollistajia ja ratkaisuja kamerapohjaiseen vuorovaikutukseen mobiililaitteissa. Väitöskirja tutkii uusien käyttäjäinteraktiomenetelmien kehittämistä vuorovaikutusta, suoritustehoa, alhaista viivettä ja energiatehokkuutta tuottavia keinoja analysoimalla ja vertaamalla. Tutkitut tekniikat mobiilista grafiikkaprosessoreista ja erillis sensoriprosessoinnista uudelleen konfiguroitaviin kuvaprosessoreihin tuovat ymmärrystä tulevaisuuden mobiilien alustojen suunnitteluun
Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces
Photoplethysmography (PPG) signals have become a key technology in many
fields such as medicine, well-being, or sports. Our work proposes a set of
pipelines to extract remote PPG signals (rPPG) from the face, robustly,
reliably, and in a configurable manner. We identify and evaluate the possible
choices in the critical steps of unsupervised rPPG methodologies. We evaluate a
state-of-the-art processing pipeline in six different datasets, incorporating
important corrections in the methodology that ensure reproducible and fair
comparisons. In addition, we extend the pipeline by proposing three novel
ideas; 1) a new method to stabilize the detected face based on a rigid mesh
normalization; 2) a new method to dynamically select the different regions in
the face that provide the best raw signals, and 3) a new RGB to rPPG
transformation method called Orthogonal Matrix Image Transformation (OMIT)
based on QR decomposition, that increases robustness against compression
artifacts. We show that all three changes introduce noticeable improvements in
retrieving rPPG signals from faces, obtaining state-of-the-art results compared
with unsupervised, non-learning-based methodologies, and in some databases,
very close to supervised, learning-based methods. We perform a comparative
study to quantify the contribution of each proposed idea. In addition, we
depict a series of observations that could help in future implementations.Comment: 20 pages, 10 figure